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ABSTRACT 

We explore the possible role of evolution in the analysis of data on SNe la 
at cosmological distances. First, using a variety of simple sleuthing techniques, 
we find evidence that the properties of the high and low redshift SNe la 
observed so far differ from one another. Next, we examine the effects of 
allowing for an uncertain amount of evolution in the analysis, using two simple 
phenomenological models for evolution and prior probabilities that express a 
preference for no evolution but allow it to be present. One model shifts the 
magnitudes of the high redshift SNe la relative to the low redshift SNe la 
by a fixed amount. A second, more realistic, model introduces a continuous 
magnitude shift of the form Smlz) = (3\n{l + z) to the SNe la sample. 
The result is that cosmological models and evolution are highly degenerate 
with one another, so that the incorporation of even very simple models for 
evolution makes it virtually impossible to pin down the values of Qm and 
Oa; the density parameters for nonrelativistic matter and for the cosmological 
constant, respectively. The Hubble constant, Hq, is unaffected by evolution. 
We evaluate the Bayes factor for models with evolution versus models without 
evolution, which, if one has no prior predilection for or against evolution, is 
the odds ratio for these two classes of models. The resulting values are always 
of order one, in spite of the fact that the models that include evolution have 
additional parameters; thus, the data alone cannot discriminate between the two 
possibilities. Simulations show that simply acquiring more data of the same type 
as are available now will not alleviate the difficulty of separating evolution from 
cosmology in the analysis. What is needed is a better physical understanding of 
the SN la process, and the connections among the maximum luminosity, rate 
of decline, spectra, and initial conditions, so that physical models for evolution 
may be constructed, and confronted with the data. Moreover, we show that 
if SNe la evolve with time, but evolution is neglected in analyzing data, then, 
given enough SNe la, the analysis hones in on values of Qm and Q\ which 
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are incorrect. Using Bayesian methods, we show that the probabihty that the 
cosmological constant is nonzero (rather than zero) is unchanged by the SNe 
la data when one accounts for the possibihty of evolution, provided that we do 
not discriminate among open, closed and fiat cosmologies a priori. The case for 
nonzero cosmological constant is stronger if the Universe is presumed to be flat, 
but still depends sensitively on the degree to which the peak luminosities of SNe 
la evolve as a function of redshift. 

Subject headings: cosmology: observations — distance scale — statistics — 
supernovae: general 

1. Introduction 

The realization that the rates of dechne of the brightnesses of Type la supernovae (SNe 
la) are correlated with their peak luminosities (Phillips 1992) has led to renewed efforts 
to use them as cosmological distance markers (Hamuy et al. 1995, 1996a, 1996b; Riess, 
Press and Kirshncr 1995, 1996). Ongoing searches for high redshift {z ~ 0.5 — 1) SNe la 
have employed phcnomenological models for these correlations to constrain the variation 
of luminosity distance Di{z) with redshift; the results have been interpreted to imply the 
existence of a nonzero cosmological constant (Pcrlmutter et al. 1998 hereafter P98, Riess et 
al. 1998 hereafter R98). Moreover, the results appear to rule out the simplest version of 
a flat cosmology, in which the density parameter for "ordinary" matter (including as yet 
unidentified nonbaryonic material) Qm = 1 and the density parameter for the cosmological 
constant Qa — 0. 

Although the logical possibility that JIa ~ 1 today has long been recognized (Einstein 
1917), it is anathema to many theorists, since the associated vacuum energy density must 
be pyac ~ 10~^^^Mp, where Mp is the Planck mass. (Theoretical and conceptual problems 
with a nonzero cosmological constant so small compared with its "natural" scale have been 
reviewed by Weinberg 1989 and Carroll, Press & Turner 1992.) On the other hand, there 
is some evidence from large scale structure simulations that Q\ 7^ in a fiat Universe 
fits the observations well (Ccn 1998). Conceivably, what we interpret as a cosmological 
"constant" might be an evolving field (e.g., Caldwell, Dave & Steinhardt 1998; Garnavich 
et al. 1998; Perlmutter, Turner & White 1999). In any case, a convincing demonstration 
that the expansion rate of the Universe is increasing would have a revolutionary impact on 
our understanding of fundamental physics. 
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In view of the importance of the potential discovery of a nonzero cosmological constant, 
we have undertaken an independent study of the pubhshed data in an effort to understand 
their imphcations better. Our motivations are both phenomenological and astrophysical 
(and may end up being related ultimately). On the phenomenological side, we note that 
three different analysis methods are used to compute distances, the multicolor light curve 
shape (MLCS) method (Riess, Press & Kirshner 1995, 1996), the M15 or template fitting 
(TF) method (Phillips 1992; Hamuy et al. 1995, 1996a, 1996b), and the stretch factor (SF) 
method (Perlmutter et al. 1997). None of these methods is a perfect description of reality. 
As we will show, they are not always in agreement, and there seems to be no physical or 
phenomenological reason to prefer one to the other. 

On the astrophysical side, we note that there are processes such as evolution of the 
supernovae sample that can mimic the effects of cosmology at high redshifts and which are 
extremely difficult to constrain convincingly with the current data. Therefore, it is useful 
to ask at what level the current data are able to distinguish the effects of cosmology from 
these other processes. We find that allowing for the possibility of a redshift-dependent shift 
in SNe la peak magnitudes (such that the most distant observed SNe la are dimmed by 
~ 0.2 to 0.3 mag) renders Qm = 1 and = acceptable, and that this is true for a variety 
of phenomenological models for the evolution. We also present results of simulations that 
show that if SNe la luminosities evolve with redshift, but evolution is neglected in analyzing 
the data, then, given enough data, the analyses will settle on precisely determined, but 
incorrect, values for Qm and Q\, and that the incorrectness of the model will not be 
detectable with a standard goodness-of-fit test. However, we find that the Hubble 
constant, Hq, is virtually unaffected by evolution. 

We believe that it is unjustifiable to try to determine cosmological parameters Qm and 
Q/y from data on "standardized" candles, such as the peak luminosities of SNe la, without 
allowing for the possibility of source evolution. Our attitude is that an uncertain amount 
of evolution must be presumed to occur, as a default; and the sensitivity of the results to 
the uncertainty must be studied. Hopefully, one can demonstrate from the data that source 
evolution is absent or negligible. If that turns out to be untrue, as recent examination of 
the risetimes of the light curves of the SNe la sample has preliminarily indicated (Riess, 
Filippenko, Li, Sz Schmidt 1999), one might hope instead to constrain the parameters in 
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an evolutionary model along with the cosmological parameters. Optimistically, one would 
anticipate that this might be accomplished once enough data are acquired. We argue, 
from simulations employing simple, phenomenological models, that such optimism may be 
unrealistic. What is needed is a better physical understanding of the SN la process and 
its evolution with redshift, before cosmological parameters can be determined reliably from 
SNe la catalogues. Such an understanding is currently being sought by theorists (see, e.g., 
von Hippel, Bothun, & Schommer 1997; Hoflich, Wheeler, & Thielemann 1998; Dominguez 
et al. 1999). 

In using observations of SNe la to determine cosmological parameters, the raw data 
are combined by any of the three methods mentioned above to derive single parameter 
summaries - the distance moduli to the sources. When a single catalogue of data is 
subjected to different types of analysis, each of which derives one quantity per source, the 
results of the individual analyses need not agree with one another entirely, and there is 
information contained in the degree to which the answers derived by the different methods 
diffeiQ. In the example under consideration, the different analysis methods might probe 
slightly different physical aspects of the SN la mechanism, and their relationships to 
reality and to one another may be different at high and low redshift. Indeed, one clue 
that evolutionary effects are important would be a systematic drift with increasing redshift 
between the distance moduli implied by the MLCS and TF methods for the SNe la observed 
and analyzed by R98 where identical SNe la data are subjected to two different analysis 
methods. 

In order to set the scale of interest for our investigations of potential systematic 
effects in these data we plot, in Figure 1, the joint credible regions for Qm and for the 
largest available data set (P98), analyzed as published (Figure la) and after introducing 
a systematic offset to all the high redshift distance moduli {z > 0.15) of —0.1 magnitudes 
(Figure 16). We see that a correlated systematic shift of this size would have a major 
impact on the interpretation of the data. A nonzero cosmological constant would still be 
favored, but the statistical significance of the result would be much reduced. 

In Section |^ we review the published data that we employ in our study, as well as the 
salient features of the light curve fitting methods. In Section ^ we compare the different 
fitting methods on a supernova by supernova basis where possible. In Section ^ we explore 
whether the data have sufficient shape information to distinguish effects of cosmology from 
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other cosmological effects such as evolution. We summarize our conclusions in Section 13. 



2. Measurements of Qm and il\ Using Type la Supernovae 

The traditional measure of distance to a SN is its distance modulus, /x = mboi — ^boi, 
the difference between its bolometric apparent magnitude, mboi, and its bolometric absolute 
magnitude, Mboi- In the Friedman- Robert son- Walker (FRW) cosmology, when the (relative) 
peculiar velocity of the source is negligible, the distance modulus is determined by the 
source's redshift, z, according to 



/X = 5 log 



+ 25 



1 Mpc 

= fiz;nM,nA,Ho). (1) 

Here the luminosity distance Dl^z^VLm^^k^Hq) = cHQ^dLiz]^lM,^A), where c is the 
velocity of light, Hq is Bubble's constant at the present epoch, and the dimensionless 
luminosity distance from redshift z is 

dLiz;nM,nA) = (l + z)|fifc|-^/'sinn{|fifc|i/2 T + ^)2(i + _ ^(2 + ^)^]^]-i/2}^ (2) 

Jo 

with fifc = 1 — Qm — ^A; and sinn(x) = sinh(a;) for > and sin(a;) for < (e.g., 
Carroll, Press & Turner 1992). In principle, one could infer the cosmological parameters 
Ho, Qm, and Qa from the distribution of measured distance moduli of sources at a variety 
of redshifts. 

Several factors complicate implementation of such an analysis. In reality, bolometric 
data are not available, and one must infer /i using magnitudes mx and Mx in some 
bandpass, X. The bandpass maps to a different region of the spectrum as a function of 
redshift z, so fi cannot be calculated simply by taking the difference between band-limited 
magnitudes; a /^-correction term must be added whose value depends not only on the 
source's redshift, but also on its spectrum. In addition, extinction along the line of sight 
increases the apparent magnitude by some amount Ax not due to the cosmological effects 
modelled in equation (|l|). Further, the absolute magnitude of the source — bolometric or 
band-limited — is not directly measured, but must instead be inferred from other source 
properties. Finally, the inevitable presence of statistical uncertainties and peculiar velocities 
further complicates straightforward use of equation (|I]). 

Of these complications, the need to infer the absolute magnitude indirectly is the most 
troublesome. Ideally, one seeks a population of "standard candles" such that all members 
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of the population have the same M (for convenience we henceforth drop the subscript X). 
If such a population could be identified, the parameters Qm and Qa could be inferred even 
if the actual value of M for the population were unknown (the remaining parameter, Hq, 
would remain undetermined). Historically, all attempts to identify such a population have 
failed. Particularly worrisome is the possibility that some classes of objects that appear to 
be approximately standard candles locally (at low redshift, where they can be studied in 
detail) have evolved significantly, so that their younger counterparts at high redshifts have 
different absolute magnitudes, thwarting their use as cosmological distance indicators. 

SNe la were briefly considered promising candidates for standard candles, but observers 
quickly discovered that SNe la are not all identically bright (Branch, 1987; Barbon, Rosino, 
& lijima 1989; Phillips et al. 1987, 1992; Filippenko, et al. 1992a, 1992b; Leibundgut, et al. 
1993). The intrinsic dispersion in the peak absolute magnitudes of SNe la, determined from 
studies of nearby events, is approximately 0.3 - 0.5 mag (Schmidt et al. 1998). However, 
there is an apparent empirical correlation between the rate of decline of the light curve of a 
given SN la and its luminosity at maximum brightness that was first quantified by Phillips 
(1992). Various techniques have been developed to take advantage of this correlation 
to determine the absolute magnitudes of individual supernovae using their light curves 
(Phillips 1992; Hamuy et al. 1995, 1996a, 1996b; Riess, Press & Kirshner 1995, 1996; 
Perlmutter et al. 1997); the relationships used in these analyses have come to be known 
generically as "PhiUips relations." When apphed to nearby SNe la, these methods reduce 
the dispersion of the distance moduli about the low- 2; FRW distance modulus vs. redshift 
relation to 0.15 (Hamuy et al. 1996a, Riess, Press & Kirshner 1996). 

The goal of the high redshift supernovae searches is to observe a large sample of 
supernovae at relatively large z, and understand their properties well enough to infer reliable 
distance moduli for them, allowing accurate determination of cosmological parameters. Two 
experimental groups have recently announced and published results from their independent 

programs to discover and study high redshift supernovae for this purpose (Perlmutter 
1997 and P98; R98). The resulting two data sets share many low redshift SNe discovered 
by previous surveys, but include different high redshift SNe, and differ in their analysis 
methods. We take advantage of the similarities and differences among the data and methods 
used to assess the consistency or inconsistency of the assumptions underlying the analyses. 

P98 have published data on 60 SNe la. Of these, 18 were discovered and measured in 
the Calan-Tololo survey (all at low redshift; Hamuy et al. 1996c), and this group discovered 
42 new SNe la at redshifts between 0.17 and 0.83. The /i values are inferred using the SF 
hght curve fitting method and are typically uncertain to ±0.2 magnitudes ("la"). The 
SF method (Perlmutter et al. 1997; P98) is based on fitting a time-stretched version of a 



- 7- 



single standard template to the observed light curves. The stretch factor, s, is then used to 
estimate the absolute magnitude of the SNe la via a linear relationship that is determined 
jointly with the cosmological parameters. The quoted fi values include a correction for 
extinction in the Galaxy based on the detailed model of Burstein & Heiles (1982). 

R98 have pubhshed results based on 50 SNe la. Of these, 37, including 27 at low 
redshift (z < 0.15) and ten at high redshift {z > 0.15) have well-sampled light curves 
in addition to spectroscopic information; the quoted "lo"" uncertainties for fi for these 
SNe la are typically smaller than ±0.2 mag at high z for determinations by either MLCS 
or TF light curve fitting method. The data for 17 of the SNe la at low redshift come 
from the Calan-Tololo survey (Hamuy et al. 1996c). We focus our attention on these 37 
best-observed SNe la, which dominate the analysis in R98. These authors extensively 
tabulate their reduced data and provide detailed information about their fitting techniques, 
thus facilitating independent analysis of their conclusions. 

R98 employ two different methods to estimate the distance modulus based on 
information from the light curves. The TF method (Hamuy et al. 1996a) fits a set of 
light curve templates with different values of Ami5, the total decline in brightness from 
peak to 15 days afterward, to observations of a particular SN la. By interpolating between 
the values of for the fits to the various templates, a minimum value of Amis for 
the SN la is estimated. The peak absolute magnitude is deduced from the independently 
calibrated linear relationship between M and Amis. The MLCS method consists of fitting 
an observed light curve to a superposition of a standard light curve and weighted additional 
templates that parametrize the differences among SNe la (e.g., Riess, Press & Kirshner 
1996); the outcome of the fits for a particular SN la consists of the weights associated with 
the deviations from the standard, which in turn determine the difference between its peak 
absolute magnitude and the standard's. The fits are done for more than one color, and 
reddening and extinction are inferred from color dependences (Riess, Press &: Kirshner 
1996, R98). Originally the MLCS method used a rather small training set to determine the 
requisite templates (Riess, Press & Kirshner 1996), but R98 now train on a considerably 
larger set of nearby SNe la to find them. Both the MLCS and TF methods are calibrated 
on nearby SNe la in the Hubble flow {z < 0.15) and then applied to the SNe la discovered 
at high redshift. The quoted fi values include a correction for local extinction derived from 
the Burstein and Heiles model, and in addition the MLCS method uses color dependence 
to estimate corrections for the extinction and reddening due to absorbing material in the 
host galaxy. 

Schematically, we can consider a lightcurve fitting method to estimate the distance 
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modulus for supernova number i according to the following model: 

fii = mi-{Mo + Ai). (3) 

Here rrii is the peak apparent magnitude for the SN, Mq is a fiducial absolute magnitude 
(a single constant for a particular method), and Aj is a shift so that the peak absolute 
magnitude for the SN is given by (Mo + Aj). For the purposes of this paper, we have ignored 
ii"-corrections and extinction in equation (one can consider them to have been accounted 
for in the rrii estimates and their uncertainties). These corrections could potentially be 
important sources of systematic error; the observing teams have gone to some lengths to 
constrain the sizes of such errors. Here we concentrate on the possibility that systematic 
errors are introduced by the lightcurve fitting algorithms entering the analyses via the shifts 
Aj. We will seek information about such errors by comparing the shifts across methods, 
rather than through analysis of the internal consistency of a particular method. 

The various fitting methods differ in how rrii is interpolated from the observed 
(incompletely sampled) lightcurve, in the choice for Mq, and in how the shifts Aj are 
determined from the (multicolor) lightcurve shapes. The MLCS method provides Aj 
directly from fitting to a family of templates parameterized by Aj. For the TF method, 

= /3tf(^^i5 — 1-1), with /?TF a constant determined by fits. For both the MLCS 
and TF methods, Mq is inferred through the use of SNe la that have Cepheid distances, 
and the various parameters specifying the shift as a function of the lightcurve shape are 
set by analyses of low redshift SNe. For the SF method, Aj = a{s — 1), where s is the 
above-mentioned stretch factor and a is a constant estimated jointly with cosmological 
parameters in fits to the entire survey. Mq is simply set at an arbitrary value; accordingly, 
no attempt is made to infer Hq using SNe analyzed with the SF method. In principle, 
each of the quantities on the right hand side of equation (|^) has uncertainty associated 
with it, and the resulting errors in the estimates for these quantities are correlated. But 
only the combination given by equation (^) appears in cosmological fits, so the lightcurve 
fitting results can be summarized by the best-fit absolute magnitude /ij and the total /Xj 
uncertainty cTj for each SN. These quantities, and the shifts Aj, are the focus of our study. 

Figure 2 shows histograms of the shifts deduced from the MLCS (R98), TF (R98) 
and SF (P98) methods for the observed SNe la. Since the choice of Mq can vary from 
method to method, we do not expect the histograms to be aligned. However, differences in 
histogram shape would indicate that the various methods are correcting SNe in different and 
possibly inconsistent ways. While the three methods claim to reduce the dispersion in the 
magnitude- redshift relationship at low 2, it is clear from the figure that they produce rather 
different distributions of shifts. Although the SF method has been applied to a different set 
of SNe la than the other methods, this alone cannot explain the obvious differences between 
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the shapes of the histograms (we note that 14 SNe are common to all three methods). 
Most striking is that the distribution is extremely narrow for the SF method, indicating 
that, by this measure, the P98 SNe la sample consists almost entirely of standard candles, 
or that for this sample of SNe la, the adopted brightness-decline rate relation is not valid. 
This suggests to us that these methods may be sensitive to different aspects of the SN 
la phenomenon. A consequence of this is that if the properties of SNe la change with 
redshift, the relationships between the ji estimates produced by the three methods could be 
z-dependent. A search for such a dependence could thus provide information about redshift 
dependence of SNe la properties. In the following section we use exploratory methods to 
search for evidence of this and other kinds of dependences. 

3. Sleuthing 

Our approach in this section is driven by our belief that it is not sufficient to settle for 
the consistency of the final cosmological inferences of the MLCS, TF and SF analyses. We 
should expect consistency between them (statistically) on a supernova-by-supernova basis 
where such a comparison is possible. 

Since R98 use two different methods to compute distance moduli for their sample of 
37 SNe la, we can compare the results and search for systematic differences between them. 
Both the TF and MLCS techniques are cahbrated using the same set of nearby SNe la in 
the Hubble flow and there is only one set of observational data for each SN la; consequently, 
the uncertainties for the two methods are highly correlated. Another comparison set is the 
group of 14 supernovae from the Calan Tololo survey that are included in both R98 and 
P98. Since all fitting methods make use of the same published light curves for this sample 
of 14 events, the inferred quantities for this sample will also be highly correlated. 

3.1. Pointwise Consistency 

In Figure 3, we compare the distance moduli measured with the different techniques on 
common samples of SNe la. In Figure 3a, we show /imlcs vs /itf for the 10 high redshift 
SNe la analyzed in R98, and in Figure 3b, we show /xmlcs vs ^sf for the 14 common 
Calan- Tololo SNe la analyzed by both SF and MLCS methods. The error bars for 
are derived from the uncertainties in the individual distance moduli for each supernova 
(R98,P98), except that we have removed the contribution associated with the intrinsic 
dispersion of the SNe la sample, estimated to be (Tint = 0.10 at low z, (Tint = 0.15 at high z, 
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and we have removed the contribution associated with the pecuhar velocity of the SNe la 
{a^ = 300 in P98, = 200 in R98). Both the errors in the distance modulus from intrinsic 
dispersion in the sample and from the peculiar velocity of the SNe are completely correlated 
among the different methods. We have not removed the correlations due to i^'-corrections, 
photometry and extinction (e.g., Schmidt et al. 1998), because there is insufficient published 
information for us to do so properly; consequently, we have overestimated the uncorrelated 
portion of the distance modulus error somewhat. 

From Figure 3 it is clear that the estimates for the distance moduli from the different 
methods are strongly correlated, as they should be. However, there is more dispersion in 
these plots than we would expect based on the quoted errors. A fit of a straight line of 
slope 1 gives a x^/'^ (with u the number of degrees of freedom) of 22.8/9 for Figure 3a and 
21.2/13 for Figure 3b indicating that there are errors associated with the analysis methods 
that have not been accounted for. 

We can pursue this type of comparison further with the R98 data where all of the SNe 
la have been fully analyzed with two independent methods. In Figure 4 we compare the 
MLCS and TF estimates of various quantities that are used in inferring the distance moduli 
of the SNe la events. For the 37 SNe la analyzed in R98, Figure 4a shows the host galaxy 
extinction. A, 4b shows the correction to the absolute magnitude. A, and 4c illustrates the 
peak apparent magnitude, m, calculated with the MLCS and TF analysis methods. (The 
individual errors for the extinction and A are not published but can be crudely estimated 
to be of order 0.1 magnitudes.) Again, there is more dispersion evident in these plots than 
might be expected from the quoted or estimated errors except for the correlation plot of 
'^MLCS versus rnxF- The peak apparent magnitudes inferred via the two methods, which 
are the quantities most directly related to the raw data, are in excellent agreement. 



3.2. Redshift and Luminosity Dependence 

In Figure 5, we plot the difference A/x = /xmlcs — A'-tf between the distance moduli 
determined from MLCS and TF respectively, as a function of z. The error bars for Afi are 
derived from the uncertainties in the individual distance moduli except that, as described 
above, we have removed the contribution associated with the intrinsic dispersion of the 
SNe la sample. Formally, we use = ct^lcs + "^tf ~ ^ct^q^ to calculate the error bars 
shown in Figure 5. Although the data are somewhat scattered at both high and low z, 



^The error due to intrinsic dispersion in the SNe la sample is estimated to be somewhat larger in P98; 
o'int = 0.17. For the purposes of comparison we remove the smaller estimated value of the correlated error. 
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Figure 5 shows that the MLCS and TF methods agree rather well at low apart from 
significant dispersion (cr ~ 0.2 mag) but there are hints of disagreement at large z, where 
the dispersion, at least, appears larger, and the mean may also be shifted. 

While it is possible that the appearance of Figure 5 at large z merely reflects small 
number statistics. Figure 6 suggests that the incompatibility between TF and MLCS could 
be systematic. In Figure 6, we plot A/x versus , an estimated absolute magnitude, 
defined by 



= (M^LCS ^ ^TF) /2 (4) 

where 

^MLCS ^ m^^S _ ^^^^^ _ ^MLCS 

M7 = mi^-/iTF-Ai^ (5) 

and A^^'^^'^^ 

is an estimate of the extinction due to the host galaxy in the MLCS (TF) 
correction scheme. For z > 0.15, R98 provided all of the information necessary to calculate 
j^^MLCS^ jy^TF ^^^^^ j^KV ^ ^^^^ ^OVf Z, Mb was only given for a subset of the SNe la 

in Hamuy et al. (1996a). 

According to Figure 6, the difference between /xmlcs and /xtf appears to be correlated 
with the estimated intrinsic brightness, M^^ , at high z, but not at low z. (Recall that the 
error bars on A/i are overestimates, as explained above.) A similar correlation is evident 
if A/x is plotted against Amlcs(Atf), the difference in maximum absolute magnitudes for 
the observed SNe la and the fiducial SNe la according to the MLCS(TF) method. (R98 
tabulates Amlcs and Atf for all SNe la in their sample.) Figure 6 suggests that, at high 
z, one of the analysis schemes, MLCS or TF, either under-corrects or over-corrects for the 
luminosity variations in the SNe la sample. Since no such systematic trend is evident at 
low z, we cannot know which method, if either, yields the more accurate value of distance 
modulus. It is relatively uncontroversial to say that the two methods are not identical, 
either at low or high z, and hence must probe SNe la physics in slightly different, and 
as yet ill- understood ways (Hoflich and Kholkov 1996, Hoflich, Wheeler and Thielemann 
1998). Thus, the indications of 2;-dependence implied by Figures 5 and 6, while still based 
on relatively few events, are not especially surprising. 



^The zero point reference for Mb rnay be somewhat different for the high redshift and low redshift data 
which may account for the fact that in Figure 6, the low redshift supernovae seem to be, on average, less 
luminous by about 0.5 magnitudes. Our conclusions are robust against a shift in the zero point of the 
magnitude scale for the low redshift supernovae. 
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A worrisome feature of Figures 5 and 6 is that imperfect corrections for luminosity 
variations can alter the conclusions about Qm and that we draw from these data. To 
illustrate this point, we computed la confidence contours in {Q/^,Qm) space with separate 
fits to intrinsically bright and intrinsically dim SNe la; the results are shown in Figure 
7. The separation into 'bright' and 'dim' was somewhat arbitrary, and we have verified 
that making different choices does not affect the overall conclusion. | The la confidence 
level contours for the combined data (all M^^) are also shown as dashed curves. Figure 
7 indicates a systematic difference between the cosmology favored by intrinsically bright 
versus intrinsically dim SNe la when the MLCS method is used; the effect is much less 
pronounced for TF and seems to be of the opposite sign for the SF method. The trend 
may be understood if the MLCS (SF) method tends to overestimate (underestimate) the 
luminosities of intrinsically bright SNe la at high redshift. Such a trend for the MLCS data 
is also consistent with Figure 6. 

The set of plots in Figures 2 through 7 indicate that the analysis methods disagree 
in their inferences of /i. A, and A at a level that is not covered by the quoted errors. We 
can only speculate on the sources of the discrepancies. However until these methods are 
understood more systematically, it will be difficult to avoid assigning additional systematic 
errors to the measured distance moduli with sizes that reflect the systematic differences 
between the methods, and this will weaken the statistical significance of the results 
substantially. 

3.3. Validity of Phillips Relations 

So far we have been investigating the light curve fitting methods as possible sources 
of systematic error. Potentially, there are other effects that can mimic cosmology that 
are extremely difficult to constrain with the present data. The most pernicious, discussed 
at some length by the observers themselves, is evolution of the SN la population. It is 
extremely difficult to put reliable quantitative limits on evolution, and it cannot be excluded 
conclusively using the currently available spectral and color information. Furthermore, 
there is already some evidence in the current data that the high redshift sample does not 



^In preparing Figure 7, we have included a contribution to the uncertainty arising from dispersion in 
galaxy redshift using the technique described in R98. 

^ For the plots shown, we have chosen M^^ < —19.45 as intrinsically bright and Mg^ > —19.45 as 
intrinsically dim for MLCS and TF data. For the SF data, we separated intrinsically bright from intrinsically 
dim using a{s — 1) > and a{s — 1) < 0, respectively. Note that a{s — 1) can be calculated from the 
information in Tables 1 and 2 of P98. 
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have the same properties as the low redshift sample. 

The strongest evidence that the lightcurve corrections improve our knowledge of the 
SN absolute magnitude would be a demonstration that they reduce the dispersion of the SN 
distance moduli about the best-fit cosmology. (For example, Riess, Press & Kirshner 1996 
showed that MLCS reduces the dispersion about Bubble's Law for low z SNe la.) To test 
this, we have compared the dispersion between the data and the predictions of the best-fit 
cosmology with and without the corrections, Aj, inferred from the light curve fitting. We 
adopt the quantity 

i 

as a measure of the dispersion, where [li is the estimated distance modulus for SN i, Zi is 
its redshift, the function /(z) is defined by equation (|I|), and is the number of SNe la in 
the sample. We compute D separately for high and low z\ the results are given in Table 1. 

For both MLCS and TF, which are calibrated on low z SNe la, we see that the 
dispersion of the low redshift data is reduced substantially by incorporating the corrections 
derived from the relation between light curve shape and luminosity at maximum brightness. 
At high redshift, no such improvement is seen. The dispersion of the high z data about the 
best fit cosmology is virtually unchanged by the incorporation of either the MLCS or TF 
corrections. 

For SF, there is little evidence from Table I that the corrections reduce the dispersion 
in the data at all. Recall that in the SF parameterization, the relation between light curve 
width and luminosity corrections is parameterized by !S.sf = — 1) where a is inferred 
from a global fit to the data at all redshift. As was shown in Figure 2, the corrections A^^ 
are quite small so it is not surprising that they do little to reduce the dispersion in the 
data. As stated before, the SF method finds little, if any, correlation between light curve 
width and absolute luminosity when averaged over all redshift. What is startling is that 
the low redshift sample used in P98 is almost identical to the "peak subsample" of Hamuy 
et al. (Hamuy et al. 1996a). As detailed in that reference, that low redshift sample does 
show a significant correlation between light curve width and peak luminosity. If a strong 
correlation is present in the low redshift sample and only a very weak correlation is evident 
in the full sample, one is led to suspect that the correlation is not present in the high 
redshift sample; the large number of high-redshift SNe leverage the joint fit. 



'We redetermine the best fit cosmology when we remove the corrections. 
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4. Accounting For Possible Evolution 

Both R98 and P98 assume implicitly that the same light curve fitting methods may 
be applied at all redshifts sampled. This assumption is only valid if the light curve 
shape is correlated with peak luminosity in the same way at both high and low redshift. 
Given the evidence we have presented that this may not be true, which indicates, at least 
circumstantially, that the SNe la population evolves, we feel it is necessary to explore 
whether the data published so far actually are able to actually distinguish the effects of 
evolution from those of cosmology. 

Such effects fall under the rubric of "systematic errors" — because they are not 
"random," their effects on one's final inferences are difficult to account for in the 
conventional frequentist approach to statistical inference. However, both teams have 
adopted the Bayesian approach for their final analyses (though not for all intermediate 
stages of their analyses). As noted by Jeffreys (1961), the Bayesian approach is particularly 
apt for studying the effects of systematic error because of its broader notion of uncertainty. 
A Bayesian probability density describes how probability is distributed among the possible 
values of a parameter, rather than how values of the parameter are distributed among some 
hypothetical population. This permits statistical calculations with quantities that are not 
"random" in the frequentist sense. In particular, as Jeffreys noted, systematic errors are 
treatable simply by introducing parameterized models for the errors and marginalizing 
(integrating over) the extra parameters to obtain one's final inferences. 

This procedure, when followed blindly, has the potential to weaken one's conclusions 
unjustifiably. For example, one could simply introduce a systematic dependence that 
is identical to the physical dependence one is studying, but with a duplicated set of 
parameters. This duplication would prevent useful constraints from being placed on the 
parameters, since any measured effect could be "blamed" on the duplicated systematic 
dependence. Thus Jeffreys emphasized the need to compare models with and without 
systematic error terms using the ratio of the model probabilities, the odds favoring one 
model over another. The odds can be written as the product of the prior odds (expressing 
information from other data, or possibly a subjective comparison of the models) and a 
Bayes factor determined entirely by the data, the models, and the sizes of the model 
parameter spaces. If we know or strongly believe a systematic effect to be present without 
consideration of the new data before us, then obviously the systematic error model should 
be used; the prior odds would lead us to this conclusion even if the Bayes factor is indecisive. 
If we have no strong prior evidence for a systematic error, one takes the prior odds to be 
unity and relies on the data alone for determining if the effect is present, taking the Bayes 
factor to be the odds. An appealing aspect of Bayesian model comparison is that the 
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Bayes factor implements an automatic "Ockham's razor" that penalizes models for the sizes 
of their parameter spaces. Thus model complexity is accounted for by the Bayes factor. 
Except in unusual cases, needlessly increasing a model's complexity by simply duplicating 
terms prevents the Bayes factor from favoring the more complicated model. We provide a 
brief review of Bayes factors in Appendix A; standard references reviewing their use are 
Kass and Raftery (1995) and Wasserman (1997). 



4.1. Systematic Error in Hq 

To illustrate this approach, we show how it can be used to quantitatively account for 
systematic error introduced by the uncertain Cepheid distances used to infer Mq in the 
MLCS and TP methods. (The SF analysis used a "Hubble-constant-free" parameterization 
and thus could avoid explicit treatment of Mq and the Hubble constant.) We write the true 
value of Mo as (Mq + 5), where Mq is the estimate used for calculating fii, and the new 
term, S, represents the constant (but unknown) error introduced by using Cepheid data to 
calculate Mq. We describe the likelihood function for analyzing the SNe la data in some 
detail in Appendix B. The final (approximate) likelihood is equivalent to what one would 
find from modelling the tabulated /tj estimates according to, 

= f{zi) + 5 + ni (7) 
= g{zi) - r] + 6 + ni (8) 

where f{zi) is the cosmological distance modulus relation defined in equation (|l|), and Ui 
is a random error term whose probability distribution is a Gaussian with zero mean and 
standard deviation cTj. In the second line, we have separated out the Hq dependence of f{zi) 
into 

77 = 5 log (^^^ -25, (9) 

where Hq = h x lOOkms^^ Mpc^^, and C2 is the speed of light in units of lO^kms^^; g{zi) 
contains the remaining Qm- and i^A-dependent part of f{zi). 

It is clear from equation (|D that r] (and thus Hq) is degenerate with 6; we cannot hope 
to learn about one without independent knowledge of the other. But 6 is constrained by our 
knowledge of the uncertainty of the Cepheid distance scale. In particular, R98 summarize 
the uncertainties as introducing an error with a standard deviation of d = 0.21 magnitudes 
(corresponding to ~ 10% uncertainty in Hq). We account for this by introducing a prior 
distribution for 6 that is a zero-mean Gaussian with standard deviation d. 



- 16 - 



Our model now has four parameters, S, h, Qm, and The hkehhood function for the 
data is the product of Gaussian distributions specified by equation (|]) and is proportional 
to the exponential of a familiar statistic. The full joint posterior distribution is the 
product of this and priors for the parameters, including the informative prior for 
We can summarize our inferences for the cosmological parameters by integrating over 6; 
this can be done analytically and is described in Appendix B. If we want to focus on the 
conclusions for h, we numerically integrate over Qm and Qa- The result, for the MLCS 
data, is the marginal distribution for h shown as the rightmost solid curve in Figure 8. The 
best-fit value is h = 0.645, and a 68.3% credible region has a half-width ah = 0.063. This 
is approximately equal to the "total uncertainty" on Hq estimated by R98 using standard 
"rules of thumb" for accounting for systematic error; we have shown how this estimate could 
be justified by a formal calculation. For the TF data, the marginal posterior is plotted as 
the leftmost solid curve in Figure 8, and h = 0.627 ± 0.062. 

Of greater current interest are the implications for the density parameters. The 
marginal distribution for Qm and Qa is found by integrating out 6 and h. This can be done 
analytically (see Appendix B). Contours of the resulting distributions, found using both 
the MLCS and TF data, appear in Figure 9. They are identical to contours found using a 
model without 6, and essentially reproduce the results reported in R98 (minor differences 
result from our omission of the "snapshot" SNe). 

In this case, we know that Mq has been estimated using the Cepheid data, and that 
this estimate has systematic error. Formally, the prior odds favoring the model with 5 over 
one with 5 = is thus infinite. The Bayes factor comparing these models is exactly equal 
to one (this is because the SNe la data can tell us nothing about 5; see Appendix A for 
discussion of this property of Bayes factors), so the posterior odds is equal to the prior 
odds. Since we know these errors to be present, we take this 6 model to be our "default" 
model when calculating subsequent Bayes factors in this section. 

We conclude our discussion of this model by summarizing the evidence in the data 
for a nonzero cosmological constant, presuming the S model to be true. In R98 and P98, 
the marginal posterior probability that > was presented as such a summary; this 
probability was found to equal 99.6% (2.9a), 99.99% {3.9a), and 99.8% (3.1a) in the MLCS, 
TF, and SF analyses, respectively, apparently indicating strong evidence that Qa is nonzero. 
But this quantity is not a correct measure of the strength of the evidence that 7^ 0. 
This probability would equal unity if negative values of Qa were considered unreasonable 



^°The priors for fin, and JIa we take to be flat over the region shown in our plots, excluding the "No Big 
Bang" region; the prior for h we take to be flat in the logarithm. 
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a priori, yet presumably even in this case one would not consider the data to demand a 
nonzero cosmological constant with absolute certainty. The correct quantity to calculate is 
the odds in favor of a model with f^A 7^ versus a model with = 0. Considering such 
models to be equally probable a priori, this is given by the Bayes factor comparing these 
models. We find B = 5.4 using the MLCS data and B = 6.8 using the SF data, each 
indicating positive but not strong evidence for a nonzero cosmological constant (presuming 
there is no evolution). The TF data give B = 86, indicating strong evidence for a nonzero 
Q\ (again, presuming no evolution). Without clear criteria identifying one method as 
superior to the others, the data are equivocal about a nonzero cosmological constant, even 
without accounting for the effects of possible evolution. 

Similarly, R98 report the number of standard deviations that the f^M = 1, = 
point is away from the best-fit {Qm, Qa) as a measure of the evidence against the hypothesis 
that matter provides the closure density; they state this hypothesis is ruled out at the 7a 
and 9cr levels using the MLCS and TF methods, respectively. Again, a proper assessment 
of the hypothesis that Qm = 1 and ^a = requires that one give this hypothesis a finite, 
nonzero prior probability. For the MLCS method, the Bayes factor favoring a model with 
any {Qm,^a) over one with Qm = ^ and f^A = is 5 = 2.3 x 10^, so that with equal 
prior odds the probability for the latter model is p = + B) ^ 5 x 10~^ (~ 4.5(j). For 
the TF method, we find B = 2.1 x 10^ so that p ^ 5 x 10"^ (^ 5.8a). These are small 
probabilities and indicate very strong evidence against the simpler model, but they are 
much larger than the probabilities associated with 7a and 9a significances (:^ 2 x 10^^^ and 
3 X 10^^^, respectively, for two degrees of freedom). The incorrect summary statistics used 
in the previous analyses have exaggerated the evidence for a nonzero cosmological constant, 
irrespective of whether or not one considers the effects of evolution. 

4.2. Models With Evolution 

Without a detailed physical idea of the cause of evolution, we cannot explore truly 
realistic models. Instead we consider two illustrative examples. We first consider a model 
(Model I) that generalizes the 6 model just described by adding an additional offset, e, for 
the high redshift SNe; we apply this model only to data from R98. As a model of physical 
evolution, this is certainly too simple, but it is illustrative since for the R98 sample, the 



These Bayes factor calculations can also be viewed as providing the posterior probability that fl^ = by 
putting a prior probability of 0.5 on the ilA = line; in the calculations reported in R98 and P98, this line 
has zero prior probability (only finite intervals in {flM, JIa) have nonzero prior probability in their analyses). 
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observed high redshift SNe la are all in a fairly narrow band in redshift near z ~ 0.5. 
Essentially this model merely permits differences in luminosities between ^ ^ 0.1 and 
z ~ 0.5 as a consequence of evolution. On a purely phenomenological level, the model might 
be considered more realistic because the low and high redshift SNe are not treated equally 
in the R98 analysis: the MLCS and TF relations are calibrated using only low redshift SNe. 
Thus this model can be understood as allowing for a systematic offset when extrapolating 
the methods beyond the training set. For this model, we use the same prior for b as in our 
default model (zero-mean Gaussian with standard deviation d = 0.21 mag). The prior for 
e we also take to be a zero-mean Gaussian but with a different width e. The prior width, 
e, can be viewed as a description of the scale of errors we might expect from extrapolating 
low redshift properties to high redshift. 

Physically, we might expect evolution to lead to continuous variation of SN la 
properties with redshift. Also, the P98 analysis uses low and high redshift SNe la together 
in calibrating the luminosity/decline rate relation, so there is no clear separation of their 
data into low and high z subsamples. Thus, Model I is inappropriate for phenomenological 
modeling of systematic effects from lightcurve fitting of P98 data. Consequently, we 
consider a second model (Model II) which assumes that the intrinsic luminosities of SNe 
la scale like a power of 1 + 2 as a result of evolution. This second model corresponds to 
replacing equation (||) with 

^^^ = 9{zi)-7] + 6 + (3\n{l + z)+ni, (10) 

where S again represents Cepheid uncertainty in Mq (relevant only when we apply this 
model to MLCS or TF data), and (3 parameterizes the evolution. We use a Gaussian prior 
for (3 with zero mean and standard deviation b. 

For both models, we explore the dependence of the results on the prior width (e and b) 
to see how external constraints on evolution (presently unknown) could affect the analysis. 
We examine values that allow evolutionary changes of up to a few tenths of a magnitude for 
sources with z ~ 1. These characteristic magnitude shifts are comparable to the intrinsic 
dispersion seen in low redshift SNe la (Schmidt et al. 1998), which may be taken as a 
rough indication of the range of variation of peak magnitude with physical conditions in the 
explosions. Some current theoretical studies of possible sources of 2r-dependent variations 
in SNe la luminosities also find magnitude changes of this size to be reasonable (see, e.g., 
Hoflich, Wheeler, and Thielemann 1998; Dominguez et al. 1999). 

The new parameters in both models appear linearly in the model equations and can 
thus be marginalized analytically. Appendix B describes the calculations. Reality could be 
and probably is far more complicated than either model, but the sparsity of the present 
data do not justify consideration of more sophisticated models. 
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4.2.1. Model I 

Figure 10a shows contours of the marginal density for Qm and f^A using Model I to 
analyze the MLCS data and taking the prior width for e to be e = 0.1 mag. Figure 106 
shows similar results using the TF data. It is clear from these figures that the presence of a 
redshift-dependent shift of order e greatly weakens our ability to constrain Qm and Q/y from 
the SNe la data. The Bayes factor for this model over the default 6 model is 1.1 for both 
MLCS and TF; the data alone are indecisive about whether such a redshift-dependent error 
is present. Presuming it is present, the Bayes factor favoring nonzero over Q\ = is 
reduced significantly from what is found using the default model; it is 1.1 for MLCS and 
3.8 for TF. 

These results are sensitive to our knowledge of the evolution. Figures 10c and lOd show 
the MLCS and TF results again, but this time with e = 0.2 mag; the credible regions have 
grown even larger in size. Now the Bayes factor for Model I over the default 6 model is 1.3 
for MLCS and 1.2 for TF; the data remain indecisive about the presence of an evolutionary 
offset. Presuming it is present, the Bayes factor favoring nonzero over Q\ = is 0.7 for 
MLCS (i.e., slightly favoring ^^a = 0) and 1.2 for TF. 

As one might expect from these results, the most likely value of e tends to be positive, 
making the more distant sources dimmer than the nearer ones due to evolution rather than 
cosmology. For example, in Figures 10c and lOd, when Qm = 1 and f^A = (a point within 
the 95.4% credible regions), e = 0.31 ± 0.06 for MLCS and e = 0.17 ± 0.06 for TF. 

The constraints placed on Hq by the SNe la data arise mostly from the low redshift 
objects, so one would not expect allowance for evolution to drastically affect the Hq 
inferences. The analysis bears this out. The dashed curves in Figure 8 show the marginal 
distributions for h based on the MLCS and TF data using Model I with e = 0.2; they differ 
little from the distributions found using the reference model with no evolution. Similar 
results are found using Model II. 



^^The values for the default fits are already acceptable, so one might worry that the more complicated 
models are "overfitting." But the maximum likelihoods for models I and II are only slightly greater than 
those found with the default model. Bayes factors account for overfitting and it is not playing a role here. 
The operation of Bayes factors is discussed further in Appendix A. 
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4.2.2. Model II 

Figure 11 shows results from analysis of the SF data using Model II. Figure 11a 
shows contours of the marginal density for Qm and Q\ for the (3 = case as a reference; 
these contours essentially duplicate the results of Fit C in P98. Figure lib shows similar 
contours, but allowing for a nonzero the prior standard deviation for (3 was b = 0.25. 
Figure 11c repeats the analysis with b = 0.5. Again we find that the possibihty of evolution 
significantly weakens the constraints on the density parameters, but if the amount of 
evolution can be bounded, useful limits might result. The Bayes factor for evolution vs. no 
evolution is 1.0 for b = 0.25 and 1.1 for b = 0.5, so the data alone are indecisive about the 
presence or absence of this type of evolution. We find similar results when using this model 
to analyze the MLCS and TF data. 

Figure 12 shows how these findings depend on the prior uncertainty for f3. The solid 
curve shows how the Bayes factor favoring a nonzero fl\ over I^a = depends on b; only 
for 6 ^ 0.1 does the Bayes factor remain near the value of 6.8 found assuming there is no 
evolution. The dashed curve shows the Bayes factor for Model II versus the default model 
with no evolution; for no value of b in the range of the plot can the data clearly distinguish 
evolution from cosmology. This emphasizes the need to use information independent of the 
II- z relation to constrain evolution. 

4.2.3. Flat Cosmologies 

So far we have assessed the evidence for nonzero Q\ by comparing models with = 
to models with arbitrary fl\, as was done in the R98 and P98 analyses. However, many 
cosmologists would consider flat models, with = 1 — D,m, to be of special relevance (e.g., 
because of inflationary arguments). We have thus analyzed models constrained in this way, 
using our default model and models I and II. 

Figure 13 shows the marginal posterior distributions for Qm (and, equivalently, for 
Qa = 1 — Qm) presuming a flat cosmology. The three panels show analyses of the distance 
moduli reported using the MLCS and TF data with Model I (top and middle, respectively) , 
and using the SF data with Model II (bottom) . The solid curves show results based on 
the default model; the short-dashed curves show results with a small amount of evolution 
allowed (e = 0.1 or b = 0.25), and the long-dashed curves show results with a larger amount 
of evolution allowed (e = 0.2 or b = 0.5). As in the previous cases, the Bayes factors 
comparing models with evolution to the default model are all nearly equal to one. Also, 
as was found before, accounting for the possibility of evolution significantly weakens the 
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evidence for nonzero Q\. However, if one restricts attention to flat models, the evidence 
for nonzero is stronger than it is if one allows nonflat cosmologies. For the default 
model presuming no evolution, the Bayes factors favoring nonzero Q\ over a flat model 
with Qm = 1 are 2.1 X 10^ (MLCS), 2.5 x 10^ (TF) and 5.0 x 10^ (SF), much larger values 
than were found in the comparison using nonfiat models discussed above. But these 
values fall dramatically if one allows for evolution. For models with a small amount of 
evolution allowed, they decrease to 20, 48, and 14, respectively, indicating positive but not 
compelling evidence for nonzero ft a- For models with a larger amount of evolution allowed, 
they decrease further to 2.4, 2.5, and 2.3, indicating no significant evidence for nonzero fl\. 

4.3. Simulations 

Figure 14 elucidates why introducing the possibility of evolution weakens our ability to 
constrain Qm and Qa so greatly. The thick solid curve shows g{z) for the best-fit cosmology 
from fits to the SF data presuming no evolution (Qm = 0.75, Qa — 1.34). The dotted 
curve shows the same function for the fiat = 1 (f^A = 0) cosmology (which lies within 
the 68.3% credible region in Figure 11c), and the dashed curve shows the evolutionary 
component /31n(l + z) for the best-fit value of /? given this cosmology {(3 = 0.83). The 
thin solid curve shows the sum of the dotted and dashed curve. Over the range of 
redshift covered by the data {z <^ 1) and even beyond, {QM,ftA,f3) = (0.75, 1.34,0) and 
{flMj^^A, (3) = (1,0,0.83) are indistinguishable if one allows evolution of this type, unless 
one can determine /x to significantly better than the ~ 20% accuracy currently obtained at 
z ^ 1. However, we note that the models differ substantially at larger redshift (by about 
0.8 mag), which offers some hope of discerning evolution. We caution, though, that the best 
fit values of (JIm, ^^a) are likely to be different for data extending to z 2, either with or 
without evolution, so the comparison is not truly apt, and moreover (JIm, ^^a) = (0.75, 1.34) 
might be considered implausible intrinsically by many cosmologists. To amplify this point, 
we also compare (I^Mj^A)/?) = (1,0,0.83) with another cosmology, (I^Mj^a) = (0.3,0.7), 
within the 68.3% credible region of the no-evolution analysis. As is shown by the bold, 
long-dashed curve in Figure 14, accuracies better than 10% out to 2; ~ 2 would be needed 
to distinguish these cosmologies from one another out. We have systematically explored a 
wide range of cosmologies and found similar results: simple power law evolution can make 
widely disparate cosmologies appear remarkably similar. Put another way, the differences 
between cosmologies with various and Qa are well mimicked by power-law evolution 
to rcdshifts beyond those currently accessible in supernova surveys. We emphasize that 
we did not choose the form of the evolution to produce this degeneracy; this is a standard 
phenomenological model for evolution. We have found similar behavior with another simple 
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model for evolution consisting of a power law in lookback time. 

To assess how well evolution can mimic cosmology, we analyzed simulated data 
consisting of values with added Gaussian noise at redshifts that themselves had added 
Gaussian noise. The simulations were designed to roughly mimic possible future data 
hke that reported in P98 (this simphfied the analysis since Hq need not be accounted for 
explicitly as it would have to be for data like those reported in R98). The redshifts of 
the first 16 SNe in our simulated data sets were chosen to be similar to those of the 16 
low-rcdshift SNe in P98 {z <^ 0.1); the redshifts for the remaining simulated data were 
chosen randomly from a uniform distribution over some specified interval. We added 
redshift errors with a standard deviation of 0.002, and jj, errors with standard deviations 
equal to those reported by P98 for the 16 low-z SNe, and equal to 0.25 magnitudes for the 
high-z SNe (a typical value for values reported in P98). 

Figure 15 shows typical results. Here we simulated data from a fiat, Qm = 1 (^a = 0) 
cosmology with evolution described by Model II with P = 0.5. Figure 15a shows the 
results of an analysis assuming no evolution, with 38 simulated SNe redshifts in the interval 
[0.3, 1] (54 total simulated points). This corresponds to a sample size equal to that used 
in the P98 analysis and extending over a similar range of redshift. The cross indicates the 
best-fit parameter values, the dot indicates the true values, and the contours bound credible 
regions of various sizes. One would reject the true model as being improbable if evolution 
is ignored. Figure 156 shows a similar plot, with the number of high- 2; SNe increased so the 
total sample size is now 200, with the high- 2; points now spread over [0.3, 1.5]. The contours 
have shrunk considerably, converging around a point well away from the truth. In both 
figures, the best-fit point has an excellent x^/^ (53.6/52 for the small data set, 201/198 
for the large one). Evolution mimics cosmology so well that standard "goodness of fit" 
reasoning can lead one to conclude, mistakenly, that pure cosmology (with no evolution) is 
an adequate description of data of this quality even when substantial evolution is present. 

Figure 15c shows the results of an analysis of the larger data set using a model that 
includes evolution; the marginal posterior (with (3 marginalized) is shown. The credible 
regions now contain the true model, but they are large even for a data set four times the size 
of the currently published surveys, and extending to significantly higher redshift. The Bayes 
factor is of order unity, showing that data of this quality is not capable of distinguishing 
between models with and without evolution. This is further testimony to the approximate 
degeneracy between cosmology and evolution, at least ai z ^ 1.5. 

The extent to which evolution corrupts the results depends both on the true cosmology 
and on the amount of evolution allowed. Independent constraints on the amount of 
evolution could thus play an important role in allowing useful contraints to be placed 
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on the cosmology. They would enter the analysis via the prior for (3. Comparison of 
Figures 11a through 11c shows how constraints on the amount of evolution can affect one's 
final inferences. 



5. Conclusions 

Systematic uncertainty may enter the analysis of any data set as a result of real 
physical effects that are not accounted for explicitly. As an example, the use of observations 
of distant galaxies to measure the cosmological deceleration parameter had to confront the 
systematic errors introduced by the fact that not only are galaxies not standard candles, 
but their luminosities also evolve with time (e.g., Tinsley 1968; Weinberg 1972; Ostriker 
& Tremaine 1975; Tinsley 1977; Sandage 1988; Yoshii & Takahara 1988; Bruzual 1990; 
Peebles 1993). A principal goal of this paper has been to present a study of the systematic 
error due to evolution in attempts to determine Qm and Qa from observations of SNe la. 

One focus of this paper has been to see if there are indications that the SNe la 
population has evolved from z ~ 0.5 — 1 to 2; ^ 1. We have presented two arguments that 
this might be so. The first is that a comparison of the peak luminosities estimated for 
individual SNe la by two different methods, MLCS and TF, are not entirely consistent with 
one another at high redshifts, z ~ 0.5. We have asserted that the two methods very likely 
sample shghtly different aspects of the SN la mechanism, and should not be expected to 
agree completely. If evolution were entirely absent, though, the differences between them 
should not depend on redshift, contrary to the admittedly sketchy evidence of the data. A 
second hint that SNe la evolve with redshift is that while the three luminosity estimators, 
SF as well as MLCS and TF, reduce the dispersion of distance moduli about best fit models 
at low redshift, they do not at high redshift. 

These studies were intended to give us impetus to pursue the more fundamental 
point of this paper, namely that evolution must be considered possible, even if there are 
no "smoking guns" that seem to require it. Ideally, one should attempt to constrain the 
parameters of an evolutionary model at the same time as determining the parameters of 
the cosmological model. As we stated at the outset, changes in peak SN la magnitude of 
order 0.1 magnitudes out to 2; ~ 1 would alter the ranges of acceptable cosmological models 
substantially. The dispersion of SNe la peak magnitudes at low z is approximately 0.3-0.5 
mag (Schmidt et al. 1998), which might indicate a plausible range of variation for diverse 
physical conditions. Using theoretical models, Hofiich, Wheeler & Thielemann (1998) argue 
that a similar range of variation of peak luminosities could arise as a consequence of changes 
in composition which might be due to evolution. 
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To get an idea of how allowing for the possibihty of evolution would affect one's 
ability to constrain cosmological parameters, we considered two different models. In one, 
we assumed that there is a constant magnitude shift between low and high redshift. We 
also considered a model in which the peak magnitudes of SNe la evolve continuously, with 
6m{z) = /31n(l + z). In applying these models, prior assumptions about the amplitude 
of possible magnitude changes of SNe la between low and high redshifts are needed to 
evalute the systematic error that might be introduced. At present, little is known, so our 
calculations allow a range of possibilities. To do this, we assume Gaussian prior probability 
distributions for the (unknown) parameters of the evolutionary models. These priors express 
a preference for no evolution, but have adjustable standard deviations that encapsulate 
prior notions about how large possible evohitionary effects might be. The results presented 
in P98 and R98 correspond to setting these standard deviations to zero; i.e., no evolution at 
all. We adopt a more conservative viewpoint, and present results for different choices of the 
ranges of magnitude evolution that are allowed a priori. Significantly, when we permit peak 
magnitude changes out to z ^ 1 comparable to (and even somewhat smaller than) the range 
observed for low redshift SNe la (Schmidt et al. 1998), the implied systematic uncertainty in 

and Qa becomes so large that the data cannot constrain these cosmological parameters 
usefully. However, our ability to determine Ho is virtually unaffected by evolution. 

In order to assess the extent to which the data favor models allowing evolution over 
ones without evolution, we computed Bayes factors. The Bayes factor between classes of 
models with and without luminosity evolution are equivalent to the odds ratio between 
them if there is no a priori reason to prefer one over the other. In all cases, we found 
that the Bayes factors are of order unity, which means that the data themselves do not 
favor either model. If we accounted for a prior prejudice that evolution does occur, the 
odds would disfavor models in which the SNe la population has the same properties at all 
redshifts. 

The two models we have considered illustrate how well evolution can mimic cosmology. 
The less realistic model merely allowed a shift in the magnitudes of high z SNe la relative 
to low z SNe la by a fixed (but uncertain) amount, 6m. Since the SNe la in the R98 sample 
were predominantly at z ^ 0.3 — 0.5, the cosmological magnitude shift (relative to Bubble's 
law or any other fiducial cosmology) varies little over the entire redshift range they span. 
Clearly, for the high z SNe la in this sample, one only knows that there is a total magnitude 
shift between z ^ 0.1 and z ~ 0.5, not how much is due to cosmology and how much to 
evolution. The characteristic magnitude shifts due to evolution needed for a cosmological 
model with (Qm, ^^a) = (1, 0) are ~ 0.2 — 0.3. Ultimately, a model in which there is simply 
a constant magnitude difference between low and high z SNe la should fail to model data 
spanning a lage range of redshifts. 
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More daunting is the success of models allowing a continuous magnitude shift, 
Sm{z) — /31n(l + z). While it is unsurprising that such models would be approximately 
degenerate with cosmology at low z. where the combined magnitude shift, relative to 
Hubble flow, is [1.086(1 - Qm/2 + Qa) + (3]z (e.g., Weinberg 1972), it is remarkable that a 
continuous magnitude shift with this simple form cannot be discerned out to at least z ^ 1. 
Our simulations show that even if there is truly no evolution, so that reality corresponds to 
certain values of and Ha with /? = 0, models with P ^ Q and (i^|f , f^^^) 7^ {^m, ^a) 
yield apparent magnitudes that are indistinguishable from the truth within differences in 
distance modulus ^ 0.1 mag. Differences between select cosmological models may be larger 
at higher z, but often remain within ~ 0.1 mag out to z 2. 

We also use simulations to explore the converse situation where we neglect evolution 
in the analysis of samples of evolving SNe la. As an example, we simulated a set of 200 
SNe la distance moduli, including 184 high- 2; SNe with redshifts uniformly distributed over 
0.3 < z < 1.5, in a cosmological model with {Qm,^a, = (1.0,0.0,0.5). We then analyzed 
the data with evolution neglected entirely. The result was that given enough SNe la, the 
analysis would pick out a small range of "allowed" values of {Qm,^a), but centered around 
incorrect values. The true cosmology was well outside the 3a credible region for these 
simulations, yet the (incorrect) best-fit model would be judged excellent by a standard 
goodness-of-fit test. 

What is needed to separate evolution from cosmology is both detections of greater 
numbers of SNe la at high redshift with detailed measurements of light curves and spectra, 
and, equally important, a better physical understanding of the SN la process. In particular, 
one would hke to be able to link the Phillips relations, hghtcurve risetimes and spectra, 
uniquely, to internal conditions in the explosions themselves, to be able to understand how 
they might evolve with redshift (see, e.g., von Hippie, Bothun, & Schommer 1997; Hoflich, 
Wheeler, and Thielemann 1998; Dominguez et al. 1999). This would allow construction 
of realistic, not phenomenological, models for evolution, and one might hope to be able 
to constrain the parameters of these models along with cosmological ones. The analogue 
in galactic astronomy is the use of population synthesis models to study the cosmological 
evolution of the luminosity function, which might peTmit , given enough data, simultaneous 
fits for cosmological parameters (Yoshii & Takahara 1988, Bruzual 1990). Such detailed 
physical modelling might lead to a detailed, quantitative connection between the peak 
luminosities of SNe la and their spectra, which would allow additional information to be 
useful quantitatively in fitting for D,m and R98 and P98 have argued, using the spectral 
data, that there is no compelling evidence for evolution, but that does not translate into a 
convincing argument against evolution unless the sahent features of the source spectra can 
be connected unambiguously to peak luminosity. In fact, since this paper was submitted. 
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Riess, Filippenko, Li & Schmidt (1999) have claimed that the rise times of low and high 
redshift SNe la are different even though earlier studies found no comparably strong 
evidence for spectral evolution. 

In the end, what all cosmologists want to know is the probability that the cosmological 
constant is nonzero. The Bayes factor provides straightforward mathematical machinery 
for doing this calculation, whether or not evolution is included in the analysis. When the 
possibility of evolution is not included in the analysis, and no prior assumptions are made 
about the spatial geometry of the Universe, the Baycs factor for 7^ compared to 
Qa = is -B = 5.4 using the MLCS method, B = 6.8 using the SF method, and B = 86 
using the TF method, which if one is not prejudiced either way, only favors nonzero Q\ 
equivocally. (There may be reasons to be prejudiced one way or the other; see for example 
Turner 1999 for a theoretical cosmologist's point of view.) When the possibihty of evolution 
is accounted for in the analysis, the values of the analogous Bayes factors depend on one's 
prior assumptions, but rather conservatively B <^1. Thus, if we do not discriminate among 
open, closed and flat cosmological models, the data alone do not choose between Q\ ^ 
and Qjy = once the possibility of evolution is taken into account. However, if the Universe 
is presumed to be flat spatially, then the case for Q\ 7^ is stronger. If evolution is 
presumed not to occur, we flnd Bayes factors i? = 2.1 x 10^ (MLCS), 2.5 x 10® (TF) and 
5.0 X 10^ (SF), decisive odds in favor of nonzero Ha. Weak evolution (e = 0.1 in Model 
I or 6 = 0.25 in Model II) lowers these values to S = 20 (MLCS), 4.8 (TF) and 14 (SF), 
which still favors ft\ ^ positively but not nearly as persuasively. If evolution is allowed 
to be somewhat more pronounced, but still at a plausible level (e = 0.2 or 6 = 0.5), the 
Bayes factors fall to B = 2.4 (MLCS), 2.5 (TF) and 2.3 (SF), which is scant evidence for 
a non- vanishing cosmological constant. Once again, the ability of the data to distinguish 
Qa 7^ from Qa = depends sensitively on prior assumptions about evolution of SNe la, 
and underscores the importance of placing independent constraints on the possible range of 
variation of their peak luminosities with redshift. 
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A. Bayes Factors 

In Bayesian inference, to form a judgement about an hypothesis iJj, we calculate its 
probability, p{Hi\D,r), conditional on the data (D) and any other relevant information 
at hand (/). The desired probabiUty p{Hi\D,I) is not usually assignable directly; instead 
we must calculate it from other simpler probabilities using the rules of probability 
theory. Prominent among these is Bayes's theorem, expressing this posterior (i.e., after 
consideration of the data) probability in terms of a prior probability for Hi and a likelihood 
for Hi, 

p{Hi\DJ)^p{H,\I)^^^, (Al) 

where the likelihood for Hi, C{Hi), is a shorthand notation for the sampling probability 
for D presuming Hi to be true, p{D\Hi, I). The likelihood notation and terminology 
emphasizes that it is the dependence of the sampling probability on Hi (rather than D) 
that is of interest for calculating posterior probabilities. The term in the denominator is 
the prior predictive probability for the data and plays the role of a normalization constant. 
It can be calculated according to 

p{D\I) = Y.p{H,\I)Cm. (A2) 

i 

We see from this equation that the prior predictive probability is the average likelihood for 
the hypotheses, with the prior being the averaging weight. It is also sometimes called the 
marginal probability for the data. 

For estimating the values of the parameters 9 of some model, the background 
information is the assumption that the parameterized model under consideration is true; we 
denote this by M (this may include any other information we have about the parameters 
apart from that provided by D; for example, previously obtained data). The posterior 
probability for any hypothesis about continuous parameters can be calculated from the 
posterior probability density function (PDF), which we may calculate with a continuous 
version of Bayes's theorem: 

Pi(^\D,M)=p{e\M)-^^. (A3) 

Both the posterior and the prior are PDFs in this equation; we continue to use the p{ ) 
notation, letting the nature of the argument dictate whether a probability or PDF is meant. 
In this case, the normalization constant is given by an integral: 

p{D\M) = j dep{e\M)£{e). (A4) 
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The normalization constant is now the average hkehhood for the model parameters. 

When comparing rival models, Mj, each with parameters 9i, we return to the discrete 
version of Bayes's theorem in equation (|A3|) , using Hi = Mi for the hypotheses, and taking 



the background information to be / = Mi + M2 + ■ ■ ■ (denoting the proposition, "Model 
Ml is true or model M2 is true or . . ."). The likelihood for model Mi is p{D\Mi, I); but 
since the joint proposition {Mi, I) is equivalent to the proposition Mj by itself, we have 
C{Mi) = p{D\Mi). Thus the likelihood for a model in a model comparison calculation is 
equal to the normalization constant we would use when doing parameter estimation for 
that model, given by an equation like equation In other words, the likelihood for a 



model (as a whole) is the average likelihood for its parameters. 

It is convenient and common to report model probabilities via odds, ratios of 
probabilities of models. The (posterior) odds for M, over Mj is 

^ p{M,\D,I) 
p{Mj\D,I) 
p{M,\I) p{D\Mi) 
piMj\I) piD\Mj) 

- p{M,\I)''^'^ ^^^^ 

where the first factor is the prior odds, and the ratio of model likelihoods, Bij, is called the 
Bayes factor. When the prior information does not indicate a preference for one model 
over another, the prior odds is unity and the odds is equal to the Bayes factor. Kass and 
Raftery (1995) provide a comprehensive review of Bayes factors, and Wasserman (1997) 
provides a survey of their use and methods for calculating them. When the prior odds does 
not strongly favor one model over another, the Bayes factor can be interpreted just as one 
would interpret an odds in betting; Table 2 summarizes the recommended interpretation of 
Kass and Raftery. 

The Bayes factor is a ratio of prior predictive probabilities; it compares how rival 
models predicted the observed data. Simple models with no or few parameters have their 
predictive probability concentrated in a small part of the sample space. The additional 
parameters of complicated models allow them to assign more probability to other regions 
of the sample space, but since the predictive probability must be normalized, this broader 
explanatory power comes at the expense of reducing the probability for data lying in the 
regions accessible to simpler models. As a result, model comparison using Bayes factors 
tends to favor simpler models unless the data are truly difficult to account for with such 
models. Bayes factors thus implement a kind of automatic and objective "Ockham's razor" 
(Jaynes 1979; Jefferys and Berger 1992). 
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This notion of simplicity is somewhat subtle, but in some simple situations it accords 
well with our intuition that models with more parameters are more complicated and should 
only be preferred if they account for the data significantly better than a simpler alternative. 
Because Bayes factors are ratios of average likelihoods, rather than the maximum likelihoods 
that are used for model comparison in frequentist statistics, they penalize models for the 
sizes of their parameter spaces. A simple, approximate calculation of the average parameter 
likelihood given by equation (|A^) elucidates how this comes about. 



First, we assume that the data are informative in the sense of producing a likelihood 
function that is strongly localized compared to the prior. Suppose that the scale of variation 
of the prior is A^, and the scale of variation of the likelihood is 66 -C A6. If the likelihood 
is maximized at 6' = ^, then we find 



p{D\M) ^ p{e\M) J de c{e). (A6) 

Since the prior is normalized with respect to 9, p{9\M) will be roughly equal to 1/A9. 
The integral will be roughly equal to the product of the peak and width of the likelihood, 
C{9)69. Thus, 

p{D\M)^C{9)^. (A7) 

We find that the likelihood for a model is approximately given by the maximum likelihood 
for its parameters, multiplied by a factor that is always < 1 that is a measure of how the 
size of the probable part of the parameter space changes when we account for the data. 
This latter factor is colloquially known as the Ockham factor. To see why, consider the 
case of nested models: Mi and M2 share parameters 9, but M2 has additional parameters 
0. In such cases, it is not uncommon that the prior and posterior ranges for 9 are usually 
comparable for both models (this is not the case in the present work, however). Then the 
Bayes factor in favor of the more complicated model is approximately given by 

C{9) A0 ^ ' 

Thus the data will favor M2 only if the maximum likelihood ratio is high enough to offset 
which will be < 1 if the data contain any information about (j) (and cannot be > 1 in 
any case). This is in contrast to the frequentist approach, where only the ratio of maximum 
likelihoods is used. This ratio cannot disfavor M2; one thus requires the likelihood ratio 
to exceed some critical value before preferring M2, on the grounds that one should prefer 
the simpler model a priori. Unfortunately, the critical value is set in a purely subjective 
and ad hoc manner, and comparisons using likelihood ratios can be inconsistent (in the 
formal statistical sense of giving the incorrect answer when the amount of data becomes 
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infinite). The Bayesian approach can (and often does) prefer the simpler model even when 
both models are given equal prior probabilities, and the critical likelihood ratio needed to 
just prefer M2 is determined by the likelihood functions and the size of the parameter space 
searched. The odds is known to be a consistent statistic for choosing between models. 

The approximations leading to the simple result of equation (|A8|) are not valid for the 
present work, so a simple "Ockham's razor" interpretation of our results is not possible. 
Although the default model is nested in the models that have 2;-dependent systematic 
errors, it is clear from the figures that the addition of the systematic error parameters 
(corresponding to in the above analysis) greatly affects inferences of the cosmological 
parameters (corresponding to 9). Thus the 69 factors (here associated with the cosmological 
parameters) do not approximately cancel in the Bayes factor. Moreover, inferences for the 
9 and parameters are highly correlated in the SNe la problem, so it is not possible to 
identify separate 69 and 6(f) factors separately quantifying the uncertainties in the nested 
and additional parameters. We do know that the maximum likelihoods (e.g., minimum 
values) are comparable for models with and without z-dependent systematic errors. 
The more complicated models are not improving the best fit substantially, but rather the 
additional parameter allows one to make the fit nearly as good as the best fit throughout 
a large region of the parameter space (because of the near-degeneracy of evolution and 
cosmology). It is this increase of the acceptable volume of parameter space that accounts 
for the Bayes factors slightly favoring the more complicated models here. 

As is clear from equation (|A^), the prior ranges for parameters play an important role 
in Bayesian model comparison. This is in contrast to their role in parameter estimation, 
where in Bayes's theorem the prior range factor appears in both the numerator (through 
the prior) and the denominator (through the average likelihood) and thus cancels, typically 
having a negligible effect on inferences (though the range itself cancels, some effect can 
remain due to truncation of the tails of the likelihood). In particular, parameter estimation 
is typically well-behaved even when one uses improper (non-normalizable) priors, such 
as fiat priors with infinite ranges. But model comparison fails when the priors for any 
parameters not common to all models are improper, because the Ockham factors associated 
with those parameters vanish. This may at first appear to be troubling (or at best a 
nuisance), but a similar dependence on the prior range of parameters is acknowledged to be 
necessary even in frequentist treatments of many problems. For example, consider detection 
of a periodic signal in a noisy time series using a power spectrum estimator. This is a model 
comparison problem (comparing a model without a signal to one with a periodic signal), 
and in fact the spectral power is simply related to the likelihood for a periodic (sinusoidal) 
signal. In frequentist analyses, one cannot simply use the number of standard deviations the 
spectral peak is above the null expectation to assess the significance of a signal; one must 
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also take into account the number of statistically independent frequencies examined, which 
depends on the frequency range searched and on the mimbcr and locations of frequencies 
examined within that range. Similar considerations arise in searches for features in energy 
spectra, or searches for sources in images — one must take into account the number and 
locations of points searched in order to properly assess the significance of a detection. 
The results of the corresponding Bayesian calculations similarly depend on the ranges of 
parameters searched (but not on the number and locations of the parameter values used) . 
Bayes's theorem indicates that the sizes of parameter spaces (i.e., search ranges) must be 
taken into account whenever we compare models; such considerations should not be unique 
to the few applications where they have been recognized to be important in conventional 
analyses. 



As in the analyses of R98 and P98, we adopt the Bayesian approach for inferring the 
cosmological parameters and ft\, extending their analyses to include parameterized 
systematic and evolutionary components. The additional parameters are dealt with by 
marginalizing (as the R98 analysis did with Hq and the P98 analysis did with the SF fitting 
parameters). Many of the needed marginalizations can be done analytically; this Appendix 
describes these calculations. Some remaining marginalizations (including calculation of 
Bayes factors) were done numerically with various methods including straightforward 
quadrature, adaptive quadrature, and Laplace's method; application of these methods to 
Bayesian integrals is surveyed in Loredo (1999). 



Let Di denote the data associated with SN number i, and D denote all the data 
associated with the SNe in a particular survey. Let C denote the cosmological 
parameters, C = {Ho,flM,^A), and S denote possible extra parameters associated with 
modelling evolution or other sources of systematic errors. Our task is to find the posterior 
distribution for these parameters given the data and some model, M. Actually, we are 
ultimately interested in the marginal distribution for and Qa, found by marginalizing: 
p{^m,^a\D, M) = J dH(i J dSp{C\D,M). Bayes's theorem gives the joint posterior 
distribution for C and »S, 



B. Statistical Methodology 



B.l. 



Bcisic Framework 



p{C,S\D,M) (xp{C\M)p{S\M)C{C,S). 



(Bl) 
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The first factor is tlie prior for C, which we will take to be fiat over the ranges shown in 
our plots (or flat in the logarithm for Hq; see below). The second factor is the prior for 
S which we assume is independent of C; we discuss it further in the context of specific 
models, below. The last factor is the likelihood for C and S, which we have abbreviated 
as C{C,S) = p{D\C,S, M). Rigorous calculation of this likelihood is very comphcated, 
requiring introduction and estimation of many additional parameters, including parameters 
from the lightcurve model and parameters for characteristics of the individual SNe (such 
as their apparent and absolute magnitudes, redshifts, i^-corrections, etc.). With several 
simplifying assumptions, the final result is relatively simple; it can be written as the product 
of independent Gaussians for the redshifts and distance moduli of the SNe integrated over 
the redshift uncertainty, so that 



Here fti is the best-fit distance modulus for SNe number i, Si is its uncertainty, Zi is the 
best-fit cosmological redshift, and Wi is its uncertainty (mostly due to the source's peculiar 
velocity). The function F{zi) gives the true distance modulus for a SN la at redshift zf, 
in the absence of systematic or evolutionary terms, it is given by f{zi) in equation (|I]). 
For the results reported in P98, two complications appear in the likelihood. First, the 
factors are not independent; the use of common photometric calibration data for groups of 
SNe la that are studied together introduces correlations. P98 have reported a correlation 
matrix accounting for these, but the correlations are very small and we have neglected them 
here. In addition, one of the parameters defining the lightcurve model — the a parameter 
described in § 2, above — appears explicitly in the P98 likelihood so that it can be estimated 
jointly with the cosmological parameters. This parameter would appear in the fii estimates 
in equation (^2]). The data tabulated in P98 use the best-fit a, however, so we could 
not account for the uncertainty of a in our analysis. The close similarity between our 
contours in Figure 11a and those presented in P98 argues that rigorous accounting for the 
uncertainty in a plays only a minor role in the final results. 

As was done in R98 and P98, we approximate the Zi integrals in equation (p^ ) by 
linearizing the Zi dependence of F{zi) about Zi and performing the resulting convolution of 
Gaussians analytically. The result is 




(B2) 



21 



2af 



(B3) 



where /Xj = F{zi) and 



(B4) 
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The total variance af depends on C through F'{z). But this dependence is weak in general, 
and F'{z) is actually independent of C at low redshift in the pure cosmology model, with 

F'iz) = (B5) 

We follow the practice of R98 and simply use this formula for all redshifts. We use the 
same formula for models with systematic error terms that introduce an additional (weak) 
dependence on redshift and S; the redshift uncertainties are negligibly small at high 
redshifts where such dependences might become important, so the dependence of af on 
redshift is negligible. It is possible to do the Zi integrals in equation (|B2|) accurately using 
Gauss-Hermite quadrature. We have done some calculations this way and verified that the 
final inferences are negligibly affected by the redshift integral approximations. 



Equation ( [B3D is the starting point for the analyses reported in the body of this work. 
It is of a simple form: —2 times the log-likelihood is of the form of a statistic. This is 
the same likehhood we would have written down had we simply presumed at the outset 
that the reported fii values were equal to some underlying true values given by F{zi) plus 
some added noise n^; 

fii = F{zi) + Hi, (B6) 

where the probability distribution for the value of rij is a zero-mean Gaussian with standard 
deviation af. 



B.2. FRW Cosmology 

Presuming a FRW cosmology and no systematic errors, equation ( P6|) can be written, 

= fi + rii 

= gi-rj + Tii, (B7) 

where /j = f{zi) is the magnitude- redshift relation, which we can separate into a part 
Qi = g{zi) that depends implicitly only on VLm and and the Hq dependence is contained 
in Tj (defined in equation (^). Define the quadratic form Q according to 

g^^ilLZlL^. (B8) 

This is the statistic used in R98; the joint likehhood for h, Qm, and is simply 
proportional to e^^l"^ . We can analytically marginalize over h (or equivalently, over r/) to 
find the marginal likelihood for the density parameters. To do so, we must assign a prior 
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for h. We use the standard noninformative "reference" prior for a positive scale parameter, 
a prior flat in the logarithm and thus scale-invariant (Jeffreys 1961; Jaynes 1968; Yang and 
Berger 1997). This corresponds to a prior that is flat in t]. We bound this prior over some 
range At] (with limits corresponding to h = 0.1 and h = 1, so Ar] = ln[10]). The prior range 
has negligible effect on all our results (so long as it contains the peak of the likelihood) 
because the Hq parameter is common to all models, so the prior range cancels out of all 
probability ratios. Thus we could let it become inflnite, but it is a good practice in Bayesian 
calculations to always adopt proper (i.e., normalizable) priors, especially if Bayes factors 
(ratios of normalization constants) are of interest. 

Using the log-flat prior, the marginal likelihood for the density parameters is 

CiflM^A) = ^ [ dr]e-^'^. (B9) 



Arj 

To do the integral, complete the square in Q as a function of ?7, writing 



where 



Q=^^^^^-q{nM,nt,), (BIO) 



a. 



and the (J^m, nA)-dependence is isolated in 



\2 



Y^ih^JlL^L^L. (B13) 



The integral in equation ( [B9[ ) is thus simply an integral over a Gaussian in r] located at 17 
with standard deviation s; fj is the best-flt (most probable) value of 77 given VLm and ^a, 
and s is its conditional uncertainty. As long as 77 is inside the prior range and s -C Arj, the 
value of this integral is well approximated by s\/27r, so that 



£(nM,f^A) = ^e-'^/2. (B14) 

This is the marginal likelihood one would use to infer the density parameters in the absence 
of any systematic error terms. Note from equation (pi3|) that the quadratic form is just 
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what one would obtain by calculating the "profile likelihood" for the density parameters 
(the likelihood maximized over the nuisance parameters, a frequentist method sometimes 
used to approximately treat nuisance parameters). Since the uncertainty s is independent 
of flM and Q\, it follows from equation ( B14 ) that the marginal likelihood is proportional 
to the profile likelihood in this problem. 

It is also possible to do the calculation analytically using a flat prior for h, spanning a 
prior range Ah. The corresponding prior for rj is exponential; 

where a = 2.5 log e, a constant known as Pogson's ratio (Pogson 1856). The product 
of the likelihood and the prior can still be written as e~'^^'^ with Q quadratic in rj; but 
there is an additional linear term in Q from the prior. Completing the square duplicates 
equation (|B10| ), but with fj replaced with 



\_ V- 9i - h 



(B16) 



4a ~' a. 

The marginal likelihood for VLm and VL\ also has a different factor out front; it is given by 



,(a„,n.)^12||^e-./=. (BIT) 

We present this result for reference only; we use the scale-invariant prior in the body of this 
work and in the remainder of this Appendix. We have compared calculations with fiat and 
log- fiat priors for some models; the resulting marginal likelihoods are negligibly different. 



Note that equation (B17) is not proportional to the profile likelihood; the proportionality is 



a special property of the scale-invariant prior. 



B.3. Systematic Error in Hq 

Among the lightcurve model parameters, regardless of the method, is the fiducial 
absolute magnitude for SNe la, Mq. To obtain definite values for the distance moduli. 
Mo must be estimated or at least arbitrarily specified. Let Mq denote the value used to 
calculate the tabulated fit estimates. We can write the true value as 

Mo = Mo + 5, (B18) 

where 5 is an uncertain error in our estimate. Since the /t, estimates are calculated using 
Mo, they will have an additive error equal to 5 that is systematic (the same for every SNe 
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la). To account for this, equation ( [B7|) must be replaced by 

ijii = gi- T] + 5 + rii. (B19) 

Note here the degeneracy between rj and 5; since they play identical roles (up to a sign) 
in the model for the distance moduli, they cannot be individually constrained using only 
magnitude/redshift data; additional information setting a distance scale to at least one SN 
is required. Only the quantity ^ = S — rj can be inferred from the basic data. 

P98 arbitrarily specify Mq, so there is no useful information about 6 that can break the 
degeneracy between S and r]. Recognizing this, they simply forgo any attempt to infer the 
Hubble constant. Their analysis amounts to replacing rj and 6 with 7 and marginalizing 
over 7 with a flat prior; the resulting marginal likelihood for the density parameters is of 
the same form as equation (pi4|) , though with an arbitrarily large prior range for 7 (which 
can be ignored since it is common to all models being compared) . This is the likelihood we 
used for the analyses of LBL data (and simulated data) described in § 4 when we assume 
no evolutionary effects are present. 

R98 use Cepheid distances for three SNe la to estimate Mq for use with the MLCS 
and TF methods. We can consider this extra data to provide a prior distribution for 6; this 
prior breaks the degeneracy between rj and 6 in the analysis. R98 report a 10% uncertainty 
in the Cepheid distance scale for SNe la, corresponding to 0.21 magnitude uncertainty in 
distance moduli. We accordingly adopt a Gaussian prior for 6 with zero mean and standard 
deviation d = 0.21, so that 

piS) = -l=e-^'/2'^^ (B20) 
ctv27r 



We can calculate the likelihood for the cosmological parameters by multiplying the joint 
likelihood for them and 6 by this prior, and integrating over 6, as follows. 

The quadratic form in the exponential resulting from multiplying this prior by the 
likelihood resulting from equation ( pi9| ) is, 

^ _ , ^ {f^i-9i + r]- 



„2 



qiS^uM, (B21) 



where 



6{v, ^a) = s'Y: (B23) 
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and the {rj, Qm, f2A)-dependence is isolated in 



= ^ + ^(^l^^i±l^. (B24) 

As with rj in the previous subsection, the integral over 5 is a simple Gaussian integral, equal 
to 5^/277. Thus the marginal likelihood for the cosmology parameters is 

C{r],nM,nA)=-^e-'^/'. (B25) 

This is the likelihood used for the analyses of the MLCS and TF data using the default 
model, as reported in § 4.1. 



B.4. Systematic Error Prom Evolution 



The simplest model we considered with a redshift- dependent systematic or evolutionary 
component is Model I, which adds a shift of size e to the distance moduli of the high redshift 
SNe la. For this model. 



fi + S + ni 



if Zi < Zn 



(B26) 



with Zc = 0.15. We seek the marginal likelihood for the cosmological parameters, requiring 
us to introduce priors for 6 and e and marginalize over them. 



The prior for 6 is given by equation ( P20|) , and the prior for e is similarly a zero-mean 
Gaussian, but with a different standard deviation, e; 



p(e) 



1 



exp 



e 

2^2 



(B27) 



eV27r 

The quadratic form associated with the product of these priors and the likelihood function 



IS 



^ - ^2 + g2 + 2^ ^2 +2^ 

" ^ Zi<Zc Zi>Zc 



6 



(B28) 



To marginalize over e, we complete the square in e by introducing the e uncertainty t, given 
by 



1 

t2 



^2 ~'~ ^ ^2 ' 



(B29) 
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and the conditional best-fit value of e, 



(B30) 



After completing the square and integrating the resulting Gaussian dependence on e, we 
find that 



p{s)/:{6,c) 



where 



eo?v27r 

(Ai - /i - 5)' 



(B31) 



(B32) 



Note that the sum is over all SNe, and that 6 appears in e. Completing the square in 6 lets 
us identify the 6 uncertainty, s, given by 



1 1 



E 



1 f 



(B33) 



and the conditional estimate for 5, 

5{C) = 



h - fi 



(B34) 



where in these equations we have defined v and F according to 



and 



Using these, we can rewrite q as 



E 



- fi 



(B35) 



(B36) 



2j 



where the dependence on the cosmological parameters is in 

g'(c) = -5-^'^' + E^^^- 

After integrating over the Gaussian dependence on 5, the marginal likelihood is 



ts 
ed 



-q'/2 



(B37) 
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This is the hkehhood used for analyses of the MLCS and TF data based on Model I. 

For Model II, used to model the SF data, the estimated distance moduli are given by 

fii = + 7 + Phi + Hi, (B40) 

where as before 7 = 5 — 77, and hi = ln(l + Zi). We will marginalize over 7 and /?, using a 
fiat prior for 7 and a zero-mean Gaussian prior for (3 with standard deviation b. 

As already noted, the 7 marginalization is similar to the 7] marginalization already 
treated above. The result is 

p{P)C{nM,nj,) = -^e~^/\ (B41) 
A70 

where s is given by equation 

m ^^a) = s'y: (B42) 

and the VLm, f2A)-dependence is isolated in 



q{(3, VIm, ^a) = J + 2^ 2 



= ^^^--^--f--^^'. (B43) 

We assume that the prior range for 7, A7, contains the peak of the Gaussian. Since this 
range is common to all models for this data and thus cancels in all calculations, we do not 
need to specify it more precisely, and we simply drop it from subsequent calculations. 

Note that 7 depends on /3; we can isolate this dependence by writing 

^ = s^H- l3s'^G, (B44) 

where 



and 



H = T. ^> (B45) 



i I 

This helps us to do the remaining marginalization over (5. We now complete the square in 
P, identifying the /3 uncertainty, r, given by 



-40- 



and the conditional best-fit (3, 



hiifii - Qi) 



(B48) 



Integrating over the Gaussian dependence on /3 gives a factor of tvStt, and the final 
likelihood for the density parameters is 



where 



T 

This is the hkelihood used for the calculations with Model II in § 4. 



(B49) 



(B50) 
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Fig. 1. — Joint credible regions for Q\ versus for the data set of P98. The 68.3%, 95.4% 
and 99.7% confidence level contours are shown for (a) the data as published and {b) after 
introducing a systematic offset of —0.1 magnitudes to the high redshift {z > 0.15) sample. 

Fig. 2. — Histogram of the corrections to the absolute magnitudes of the observed SNe la, 
in magnitudes, deduced from the MLCS, TF, and SF methods. 

Fig. 3. — (a) fiMLCS versus fiTF for the 10 high redshift SNe la of R98. (b) (Imlcs versus 
liSF for the 14 low reshift SNe la from the Calan Tololo survey that are used in both R98 
and P98. The dashed lines are straight line fits to the data where the slope of the hue is 
fixed to 1. 

Fig. 4. — Scattcrplots comparing SNe la properties inferred using the MLCS and TF methods 
for the R98 sample of SNe la. Compared are (a) the host galaxy extinction, A, for the 37 well 
measured SNe la; (6) the correction to the absolute magnitude A for the 37 well measured 
SNe la; and (c) the peak apparent magnitude, m, of the 10 well measured SNe la at high 
redshift. The errors on A and A can be estimated to be of the order of 0.1 magnitudes. The 
dashed hues each have a slope of 1. 

Fig. 5. — The difference between the distance moduli inferred using the MLCS and TF 
lightcurve fitting methods, A/j, = Umlcs ~ I^tf, as a function of redshift z. The errors on 
the data points are described in the text. 

Fig. 6. — The difference between the distance moduli, A/x = (Imlcs — fJ'TF, is plotted versus 
an estimate of the absolute magnitude, M^^ , for SNe la at low redshift {z < 0.15) in the left 
hand plot and high redshift {z > 0.15) in the right hand plot. The dashed line in the right 
hand plot is the result of a least squares fit to the data which gives a slope of —0.6 ± 0.15. 

Fig. 7. — The 68.3% joint credible regions plotted separately for intrinsically dim and 
intrinsically bright SNe la for the (a) MLCS, {b) TF, and (c) SF analysis methods. The 
contours for the full data set (all M^^) are shown as dashed curves. 

Fig. 8. — Marginal posterior distribution for h using the MLCS and TF estimated distance 
moduli. The labeled solid curves show results using our reference model incorporating 

systematic uncertainties in the Ccphcid distances for the SNe la used to set the absolute 
magnitude scale of SNc la. The dotted curves show results using Model I, with the prior 
uncertainty for the high-2; offset e = 0.2 mag. 

Fig. 9. — The 68.3%, 95.4% and 99.7% joint credible regions for Qm and based on our 
reference model, using distance moduli calculated with the (a) MLCS and (b) TF hghtcurve 
fitting methods. 
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Fig. 10.— The 68.3%, 95.4% and 99.7% joint credible regions for for Qm and Qa based on 
Model 1. Panels (a) and (b) are for MLCS and TF, respectively, for e = 0.1, and (c) and (d) 
are for MLCS and TF, respectively, with e = 0.2. 

Fig. 11.— The 68.3%, 95.4% and 99.7% joint credible regions for Model II applied to distance 
moduli calculated with the SF hghtcurve fitting method. Panel (a) is for no evolution, and 
shown for reference. Panels (b) and (c) are for b — 0.25 and b — 0.5, respectively. 

Fig. 12. — The solid line shows the Bayes factor for Q\ ^ versus Q\ = as a function of 
the parameter b of Model II, and the dashed line shows the Bayes factor for Model II versus 
the zero-evolution reference model. 

Fig. 13. — Marginal posterior distributions for Qm (and, equivalently, for Q\ = 1 — Qm) 
presuming a flat cosmology, using data from the MLCS (top), TF (middle), and SF (bottom) 
methods. Results are shown presuming no evolution (solid curves), allowing a small amount 
of evolution (short-dashed), and allowing a larger amount of evolution (long-dashed). 

Fig. 14. — Comparison of in cosmological models with and without evolution. The thick 
solid line is for the best-flt cosmology for SF presuming no evolution {Qm, ^a) = (0.75, 1.34). 
The dotted curve is for {Qm, ^a) = (1) 0) without evolution; the dashed curve is f3 ln(l + z) 
with the best-fit value P = 0.83 for this cosmology. The thin solid line is the sum, and depicts 
the best-fit presuming (Om, ^^a) = (1, 0) with evolution included. The bold, long-dashed fine 
is for (Qm,^a) = (0.3,0.7) without evolution. 

Fig. 15. — Results of analyzing simulated data with {Qm, ^a, 13) = (1, 0, 0.5). Panels (a) and 
(b) are for analyses presuming no evolution using data sets with 38 and 186 high-redshift 
(0.3 < z < 1.5) sources, respectively; both data sets had 14 low-redshift sources. Panel (c) 
repeats the analysis of the larger data set with evolution included in the model. The crosses 
indicate the best-fit parameter values from the analyses; the dots indicate the true values 
used to generate the data. 
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Table 1. Dispersion of the data, in magnitudes, from the best fit cosmology for low z 

{z < 0.15) and high z {z > 0.15). 



Fitting Method 


With Corrections 


Without Corrections 


MLCS 


low z 
high z 


0.18 ±0.02 
0.22 ±0.05 


0.33 ±0.05 
0.20 ±0.04 


TF 


low z 
high z 


0.20 ±0.03 
0.17 ±0.04 


0.33 ±0.05 
0.20 ±0.05 


SF 


low z 
high z 


0.18 ±0.03 
0.30 ±0.03 


0.18 ±0.03 
0.30 ±0.03 



Table 2. Interpretation of Bayes Factors 







Strength of evidence for ifj over Hj 


to 1 


1 to 3 


Not worth more than a bare mention 


1 to 3 


3 to 20 


Positive 


3 to 5 


20 to 150 


Strong 


> 5 


> 150 


Very Strong 
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